AITopics | minimax optimization

Adaptive Algorithms with Sharp Convergence Rates for Stochastic Hierarchical Optimization

Neural Information Processing SystemsJun-17-2026, 07:29:23 GMT

Hierarchical optimization refers to problems with interdependent decision variables and objectives, such as minimax and bilevel formulations. While various algorithms have been proposed, existing methods and analyses lack adaptivity in stochastic optimization settings: they cannot achieve optimal convergence rates across a wide spectrum of gradient noise levels without prior knowledge of the noise magnitude. In this paper, we propose novel adaptive algorithms for two important classes of stochastic hierarchical optimization problems: nonconvex-strongly-concave minimax optimization and nonconvex-strongly-convex bilevel optimization. Our algorithms achieve sharp convergence rates of eO(1/ T + σ/T1/4) in T iterations for the gradient norm, where σ is an upper bound on the stochastic gradient noise. Notably, these rates are obtained without prior knowledge of the noise level, thereby enabling automatic adaptivity in both low and high-noise regimes. To our knowledge, this work provides the first adaptive and sharp convergence guarantees for stochastic hierarchical optimization. Our algorithm design combines the momentum normalization technique with novel adaptive parameter choices. Extensive experiments on synthetic and deep learning tasks demonstrate the effectiveness of our proposed algorithms.

artificial intelligence, machine learning, optimization, (17 more...)

Neural Information Processing Systems

Country: North America (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

56503192b14190d3826780d47c0d3bf3-Supplemental.pdf

Neural Information Processing SystemsApr-26-2026, 00:06:26 GMT

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

56503192b14190d3826780d47c0d3bf3-Paper.pdf

Neural Information Processing SystemsApr-26-2026, 00:06:22 GMT

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

254009e8d528f98764a060e877a1b01c-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-26-2026, 00:05:12 GMT

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland (0.28)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Solving a Class of Non-Convex Minimax Optimization in Federated Learning

Neural Information Processing SystemsApr-26-2026, 00:05:08 GMT

The minimax problems arise throughout machine learning applications, ranging from adversarial training and policy evaluation in reinforcement learning to AUROC maximization. To address the large-scale distributed data challenges across multiple clients with communication-efficient distributed training, federated learning (FL) is gaining popularity. Many optimization algorithms for minimax problems have been developed in the centralized setting (i.e., single-machine). Nonetheless, the algorithm for minimax problems under FL is still underexplored. In this paper, we study a class of federated nonconvex minimax optimization problems. We propose FL algorithms (FedSGDA+ and FedSGDA-M) and reduce existing complexity results for the most common minimax problems. For nonconvex-concave problems, we propose FedSGDA+ and reduce the communication complexity to O(ε 6). Under nonconvex-strongly-concave and nonconvex-PL minimax settings, we prove that FedSGDA-M has the best-known sample complexity of O(κ3N 1ε 3) and the best-known communication complexity of O(κ2ε 2). FedSGDA-M is the first algorithm to match the best sample complexity O(ε 3) achieved by the single-machine method under the nonconvex-strongly-concave setting.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.28)
North America > United States > Maryland (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization

Neural Information Processing SystemsApr-25-2026, 06:34:42 GMT

We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax optimization and show that, for many such algorithms, sampling the data points without replacement leads to faster convergence compared to sampling with replacement. For the smooth and strongly convex-strongly concave setting, we consider gradient descent ascent and the proximal point method, and present a unified analysis of two popular without-replacement sampling strategies, namely Random Reshuffling (RR), which shuffles the data every epoch, and Single Shuffling or Shuffle Once (SO), which shuffles only at the beginning. We obtain tight convergence rates for RR and SO and demonstrate that these strategies lead to faster convergence than uniform sampling. Moving beyond convexity, we obtain similar results for smooth nonconvex-nonconcave objectives satisfying a two-sided Polyak-Łojasiewicz inequality. Finally, we demonstrate that our techniques are general enough to analyze the effect of data-ordering attacks, where an adversary manipulates the order in which data points are supplied to the optimizer. Our analysis also recovers tight rates for the incremental gradient method, where the data points are not shuffled at all.

artificial intelligence, international conference, machine learning, (13 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry: Information Technology (0.46)

Technology: